![]() |
I am a senior research fellow at National University of Singapore, working with Prof. Mong-Li Lee and Prof. Wynne Hsu at IDS, also with Prof. Tat-Seng Chua at NExT++. Previously, I was an associate researcher at Skywork AI Singapore, working with Prof. Shuicheng Yan (more previously an associate researcher at SEA AI lab). I graduated as Ph.D from Wuhan University.
My research has been published in top-tier ML/NLP/CV/MM venues, e.g., ICML, NeurIPS, ACL, CVPR, AAAI, WWW, SIGIR, IJCAI, EMNLP, ACM-MM, TPAMI, TKDE, TOIS, TNNLS, TASLP. I was awarded the World AI Conference Rising Star in 2023. My papers were selected as Most Influential Papers by Paper Digest, and ESI Highly Influential Papers and 2024 WAIC Outstanding Paper Award. I was also the recipient of the 2023 WAIC Rising Star award, and ranked as Top 2% Scientists Worldwide 2024 (Single Year) by Stanford University. I’ve regularly served as (Senior) Area Chair or Senior Program Committee of top-tier conferences. I was the organization committee of WSDM 2022, EMNLP 2023, ACL 2024, ACM MM 2025. I serve as the Associate Editor of some journals, including TALLIP and Neurocomputing. And I am a persistently-invited reviewer for many journals including TPAMI, IJCV, TNNLS, TKDE, TOIS, etc. My Ph.D thesis was awarded the Excellent Doctoral Thesis of Chinese Information Processing Society (CIPS). I won more than ten honors and awards during Ph.D stage.
My research interests lie in the NLP, CV, and the intersection of both (i.e., Multimodal/Vision-Language Learning). My long-term goal is to achieve human-level AI centering around multimodal LLMs & generalists. While previously I worked a lot on the topic of Structural Modeling of Language&Vision, I pay the most recent focus on the unified multimodal generalist towards human-level capacity (Modality, Task, Knowledge) and cognition (Reasoning, Affection), with following key topics and representative works (detailed in research statement):
▶ Multimodal Foundation Models
: Unified multimodal LLMs and generalists.
▶ Capacity
: Comprehension/generation of modalities/tasks, knowledge acquisition.
▶ Cognition
: Cross-modal neuro-symbolic reasoning, human-centric affective computing.
I am constantly looking for collaborations on the above topics. Remote manner is also supported. For promising students I will provide sufficient GPUs. Hit me up, if you are a Ph.D/master/bachelor student and interested in what I am doing now. When you are from Chinese universities, there are also potential vacancies for research interns (e.g., self-/CSC-funded joint PhD project). Please describe your research status and attach your resume.
We are holding the grand challenge of Multimodal Conversational Aspect-based Sentiment Analysis (PanoSent) and Avatar-based Multimodal Empathetic Conversation (AvaMERG) at ACM Multimedia 2025, Call for Participation!
• 5 Apr 2025We are holding the first MLLM for Unified Comprehension and Generation (MUCG 2025) workshop and the first Cognition-oriented Multimodal Affective and Empathetic Computing (CogMAEC 2025) workshop at ACM Multimedia 2025, Call for papers!
• 27 Mar 2025We are holding the first Multimodal Knowledge and Language Modeling (MKLM 2025) workshop at IJCAI 2025, Call for papers!
• 24 Mar 2025We are releasing the first survey on Multimodal Chain-of-Thought Reasoning, check it now at Github!
• 27 Feb 2025Two papers are accepted by CVPR 2025, 1) Universal Scene Graph Generation and 2) 4D Scene Graph Generation. Congrats to all my co-authors!
• 8 Feb 2025One paper about Multimodal Grammar Induction is accepted by Journal of Artificial Intelligence!
• 22 Jan 2025Two papers are accepted by ICLR 2025, 1) Semantic-equivalent Tokenization and 2) Cross-modal DPO. Congrats to all my co-authors!